Text corpus

In linguistics and natural language processing, a corpus (pl.: corpora) or text corpus is a dataset, consisting of natively digital and older, digitalized, language resources, either annotated or unannotated.

Annotated, they have been used in corpus linguistics for statistical hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.

In search technology, a corpus is the collection of documents which is being searched.


© MMXXIII Rich X Search. We shall prevail. All rights reserved. Rich X Search